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Abstract — We apply Guo and Wang's relaxed belief propaga- 
tion (BP) method to the estimation of a random vector from 
linear measurements followed by a componentwise probabilistic 
measurement channel. Relaxed BP uses a Gaussian approxima- 
tion in standard BP to obtain significant computational savings 
for dense measurement matrices. The main contribution of this 
paper is to extend the relaxed BP method and analysis to general 
(non-AWGN) output channels. Specifically, we present detailed 
equations for implementing relaxed BP for general channels and 
show that relaxed BP has an identical asymptotic large sparse 
limit behavior as standard BP, as predicted by the Guo and 
Wang's state evolution (SE) equations. Applications are presented 
to compressed sensing and estimation with bounded noise. 

Index Terms — Non-Gaussian estimation, belief propagation, 
density evolution, compressed sensing, sparsity, bounded noise. 

I. Introduction 

Consider the problem of estimating a random vector x € M. n 
from the vector y G M. m shown in Fig. [T] As depicted in 
the figure, the input vector x is first passed through a linear 
transform, 

z = $x, (1) 

where <£> G M mxn is a known transform matrix, and then 
passed through an output channel or measurement channel 
described by a conditional distribution f>Y|z(y| z )- Suppose 
that the distributions of both the input vector x and output 
channel are separable in that the probability distributions 
factor as 

n m 
j=l i=l 

where px(xj) and PY\z{Vi\ z i) arc scalar distribution func- 
tions, and Xj, yi and Zi are the components of the vectors x, 
y and z, respectively. 

If m = n and the mixing matrix $ is the identity matrix, 
then the problem of estimating the input vector x from the out- 
put vector y reduces to n scalar estimation problems. However, 
for general $, optimal estimation of x is usually intractable 
because the transform matrix $ "couples" or "mixes" the n 
components of x into the m components of the output vector 
y. We thus call the problem of estimating the vector x from the 
coupled output vector y the linear mixing estimation problem. 

One natural approach to the linear mixing estimation prob- 
lem is belief propagation (BP), which iteratively updates 
estimates of the variables based on message passing along 
a graph (T), (2). In communications and signal processing, 
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BP is best known for its connections to iterative decoding 
in turbo and LDPC codes [3|-|5|. However, while turbo and 
LDPC codes typically involve computations over finite fields, 
BP has also been successfully applied in a number of problems 
with linear real-valued mixing, including CDMA multiuser 
detection (6), (7), lattice codes j8| and compressed sensing 

©-GD- 

A key theoretical justification for applying BP to the specific 
problem of linear mixing estimation came with the work of 
Montanari and Tse [12]. That worked considered BP estima- 
tion of binary ±1 vectors with AWGN measurements and large 
sparse random mixing matrices. In this setting, Montanari 
and Tse derived state evolution (SE) equations for the mean- 
squared error of BP as a function of the iteration number. Their 
analysis revealed that BP is asymptotically optimal in mean- 
square when the SE equations have a unique fixed point. The 
large sparse limit analysis was extended by Guo and Wang first 
to general priors and power levels (13) , and then to arbitrary 
(non-AWGN) output channels (14) . These results provided the 
first rigorous conditions for the optimality of BP for estimation 
with linear mixing and confirmed earlier predictions given by 
the replica method from statistical physics (15), (T6) . 

Guo and Wang's work [13| also presented the important 
result that the mean-square optimality of BP could be achieved 
by a significantly simpler algorithm that they called relaxed 
BP. One of the problems of applying standard BP to the 
linear mixing estimation problem is that the computations 
grow exponentially with the density of the transform matrix 
<£>. Relaxed BP overcomes this problem by using a Gaussian 
approximation of the messages to linearize the computations 
at the output nodes. Gaussian approximations had been used in 
earlier BP-based methods in CDMA multiuser detection (17)- 
|19] and also occasionally appear in the analysis and design 
of LDPC codes (20), (21). 

The main contribution of this paper is to extend the relaxed 
BP method and analysis: 

• Extensions to non-AWGN output channels: The relaxed 
BP algorithm described in Guo and Wang's first paper 
(13) considers only AWGN output channels. The sec- 
ond paper (T4) considers arbitrary output channels, but 
focuses on standard BP and only briefly mentions how 
to apply the relaxed BP approximations. In this paper, 
we work out the relaxed BP equations in full detail for 
general (non-AWGN) channels. Moreover, we offer some 
additional simplifications (see Section |IV-D[ ) to reduce 
the computations of relaxed BP even further. The ability 
to incorporate non-AWGN channels extends the scope 
of the relaxed BP method significantly. For example, it 
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mixing estimation problem. Section [TTT] reviews how to apply 
standard BP to estimation with linear mixing. The relaxed BP 
algorithm is introduced in Section [IV] The large sparse limit 
analysis is described in Section [V] Section VI presents some 
simple simulations of the algorithm to validate the analytic 
results. All the proofs are developed in appendices. 



Fig. 1. Linear mixing estimation problem. A random input vector x is 
transformed by a matrix <E>, and the transformed vector z is passed through 
a separable measurement channel yielding a final output vector y. The linear 
mixing estimation problem is to estimate the input vector x from the output 
vector y given the transform matrix prior px(xj) and measurement 
channel transition distribution PY\z(Vi\ z i)- 



enables the study of non-Gaussian noise processes, as 
well as discrete output channels that arise, for example, in 
pattern classification problems. We suggest some possible 
applications in Section |TI| 

Improved convergence analysis: A key result of Guo and 
Wang's state evolution (SE) analysis in 
that, when the measurement ratio j3 



(13) and (TJ is 
m/n is sufficiently 
small, relaxed BP asymptotically achieves the minimum 
mean-squared error (MSE) in the limit of large sparse 
random mixing matrices. Moreover, this minimum MSE 
is described by a unique fixed point to the SE equations. 
In this work, we extend the analysis to general j3. Specif- 
ically, we show that for any j3, there are upper and lower 
fixed point solutions to the SE equations. The asymptotic 
MSE of the relaxed BP method always converges to the 
upper fixed point, while the lower fixed point always 
provides a lower bound to the MSE of any estimator. 
Hence, in the case that the fixed point solution is unique, 
relaxed BP is asymptotically optimal. 
• Applications to compressed sensing and bounded noise 
estimation: Although relaxed BP was originally devel- 
oped for CDMA multiuser detection, the method can 
be applied to non-Gaussian estimation in a variety of 
applications. In this paper, we simulate relaxed BP for 
compressed sensing and estimation with bounded noise. 

An algorithm closely related to relaxed BP is the recently 
developed approximate message passing (AMP) method pro- 
posed in flT) and analyzed further in p2[ . The AMP algorithm 
generalizes the relaxed BP method and analysis in the special 
case of AWGN measurements. Unlike relaxed BP, the AMP 
algorithm can be applied with an arbitrary scalar estimation 
function so that the prior on the components of x do not 
need to be known. Also, as we will discuss in Section [V] 
the analysis of relaxed BP is only valid under a certain large 
sparse limit model. This model is an approximation to the case 
where $ is dense. The analysis of the AMP algorithm in | [22| 
provides rigorous results for dense measurement matrices. An 
interesting open problem is whether the analysis of AMP can 
be extended to general output channels considered here. 

A. Organization 

The remainder of this paper is organized as follows. In 
Section [II] we introduce some specific examples of the linear 



II. Examples and Applications 

The linear mixing model is extremely general and can be 
applied in a range of circumstances. We illustrate some simple 
examples for both the measurement channel and prior on x. 

A. Measurement Channel Examples 

a) AWGN output channel: For an additive white Gaus- 
sian noise (AWGN) output channel, the output vector y can 
be written as 

y = z + w = $x + w, (3) 

where w is a zero mean, Gaussian i.i.d. random vector 
independent of x. For this case, the corresponding channel 
transition probability distribution is given by 



PY\z(Vi 



<t>{Vi 



(4) 



where p, w > is the variance of the components of w and 
<fi(v ; p.) is the Gaussian distribution, 



(v, fi) 



: exp 



1 



(5) 



/2/kJl \ 2[i 

The AWGN channel is precisely the model considered by Guo 
and Wang in their original relaxed BP paper JTSJ. 

b) Non-Gaussian noise models: Since the output channel 
can incorporate an arbitrary separable distribution, the linear 
mixing model can also include the model ([3]) with non- 
Gaussian noise vectors w, provided the components of w 
are i.i.d. One interesting application for a non-Gaussian noise 
model is to study the bounded noise that arises in quantization. 
We will consider this application in the numerical simulations 
in Section IVFCl 

c) Logistic channels: A quite different channel is based 
on a logistic output. In this model, each output y; is or 1, 
where the probability that = 1 is given by some sigmoidal 
function such as 

Py\z{v% = %) = — —, r, (6) 

1 + aexp(— Zi) 

for some constant a > 0. Thus, larger values of result in a 
higher probability that yi = 1, 

This logistic model can be used in classification problems 
as follows (23): Suppose one is given m samples, with each 
sample being labeled as belonging to one of two classes. Let 
Hi = or 1 denote the class of sample i. Also, suppose that the 
ith row of the transform matrix $ contains a vector of n data 
values associated with the zth sample. Then, using a logistic 
channel model such as {5), the problem of estimating the 
vector x from the labels y and data $ is equivalent to finding 
a linear dependence on the data that classifies the samples 
between the two classes. This problem is often referred to 
as logistic regression and the resulting vector x is called the 
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Fig. 2. Factor or Tanner graph for the linear mixing estimation problem. 



regression vector. By adjusting the prior on the components 
of x, one can then impose constraints on the components of 
x including, for example, sparsity constraints. 

B. Examples of Priors 

d) Sparse priors and compressed sensing: As discussed 
in the introduction, one class of priors that we will consider 
in some detail in the simulations is sparse distributions. A 
vector x is sparse if a large fraction of its components are 
zero or close to zero. Sparsity can be modeled probabilistically 
with a variety of heavy-tailed distributions including Gaussian 
mixture models, generalized Gaussians and Bernoulli distribu- 
tions with a high probability of the component being zero. The 
estimation of sparse vectors with random linear measurements 
is the basic subject of compressed sensing [24|-[26| and fits 
naturally into the linear mixing framework. 

e) Discrete distributions: The linear mixing model can 
also incorporate discrete distributions on the components of x. 
Discrete distribution arise often in communications problems 
where discrete messages are modulated onto the components 
of x. The linear mixing with the transform matrix $ comes 
into play in CDMA spread spectrum systems and lattice codes 
mentioned above. 

III. Review of Standard Belief Propagation 

Before describing the relaxed BP algorithm, it is useful to 
first review how standard BP would be applied to the linear 
mixing estimation problem. Standard BP associates with the 
transform matrix $ a bipartite graph G = (V,E), called 
the factor or Tanner graph illustrated in Fig. [2] The vertices 
V in this graph consists of n "input" or "variable" nodes 
associated with the variables Xj, j — 1, . . . , n, and m "output" 
or "measurements" nodes associated with the observations yi, 
i = 1, . . . ,m. There is an edge E E between the input 
node Xj and output node yt if and only if j ^ 0. 

Given this graph, define the neighbor sets of the input and 
output nodes as 



N out (i) 



{i 
{J 



(M')e£), 



so that Ni n (j) is the set of neighbors of the input index j, and 
N ou t(i) is the set of neighbors of the output index i. 

The standard BP algorithm works by iteratively passing 
"messages" along the edges of this graph represented as 
probability distributions on the variables Xj. The messages 
are sometimes called beliefs. For the linear mixing estimation 
problem, the standard BP algorithm can be described as 
follows: 

1) Initialization: Set t = 1 and initialize the outgoing 
messages from the input nodes to 

Pi<-j (*' x i )=Pj( t > x j) = Px Oj ), (8) 

for all input node indices j and edges G E. This 
initialization simply sets the messages to the priors on 
the variables Xj. 

2) Mixing update: For each edge E E, compute 
p*^yj(t, Zi-).j), the distribution of the random variable 

Zi^-j ^ ^ir-^rj (9) 

rGiV out (i)#j 

assuming the variables independent with distri- 

butions x r ~ pf<_ r (t, x r ). Here, the sum is over indices 
r E N out (i) with r ^ j. Also, for each i, compute 
pf(t,Zi), the distribution of the random variable 



E 

reN out (i) 



(10) 



3) Output update: For each edge E E, compute the 

likelihood function 



PY\z(Vi I Ui + Zi->j) 

(11) 



(7a) 
(7b) 



4) Input update: For each edge E E, compute the 

distribution 

Pi^j(t+hXj) <xp x (xj) Pt-n(t,*ijXj). (12) 

Here, the oc sign indicates that the distribution is to be 
normalized so that it has unit integral. Also, compute 
the total distribution 

pj(t + l,Xj) (X Px{Xj 

Increment t = t + 1 and return to step 2 until a sufficient 
number of iterations have been performed. 

When the graph G is acyclic, then it can be shown that 
the distributions p~(xj) and pf(zi) eventually converge to the 
true marginal distributions of the random variables Xj and Zi 
given the observations y. However, for graphs with cycles, the 
BP algorithm in general only returns an approximation to the 
true marginals. An analysis of the BP algorithm is beyond the 
scope of this work and is covered extensively elsewhere. See, 
for example, p], |2) and (27) . 

What is important here is to recognize the complexity of 
the algorithm. The difficult step is the computations of the 
distributions of the variables z^j and Zi in (|9]l and ( 10 1 in 
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the mixing update. Suppose the output node t/, has in-degree 
d. That is, there are d indices j such that $^ is non-zero. 
Then, the evaluation of the distributions on z^j and Zj involve 
the integration over d — 1 and d components x r . Since the 
complexity of this computation grows exponentially in d, the 
BP algorithm is only tractable when the transform matrix $ is 
sparse (i.e., d is small). The point of relaxed BP is to provide 
an approximation to BP suitable for large, dense $. 



IV. Relaxed Belief Propagation 
A. Scalar Estimation Functions 

Before describing the relaxed BP algorithm, we need to 
define certain functions related to scalar estimation problems 
at the input and output nodes. At the input nodes, we consider 
the problem of estimating a scalar random variable x ~ px (x) 
from some scalar observation of the form 



+ v, v~Af(0,n), 



(14) 



where // > is a noise-level and v is additive Gaussian 
noise independent of x. Let F; n (g,/Lt) and & ln (q,n) be the 
conditional mean and variance of the random variable x given 
the scalar observation q. Although, the functions Fi n (q, /i) and 
£in(<7>M) ma Y not nave closed form expressions, they can be 
evaluated with one-dimensional integrals. 

To analyze the output nodes, suppose that z is a scalar Gaus- 
sian random variable z ~ A/"(z, //) and y has the conditional 
distribution Py\z(u\ z )- Let Py\z ^(vl^ A*) ^> e tne likelihood 
function 



Py\z^(v\z^) = J Py\z{v\z)4>{z-z; n) dz, 



(15) 



where <j)(z — z ; p) is the Gaussian p.d.f. in |5| with mean 
z and variance /i. The relaxed BP algorithm is based on the 
derivatives of log likelihood or score function 

gr 

D r (y,z,p) = - log p yl 2 ^(y\z,iJ,), (16) 

for r > 0. Again, this function can in general be evaluated 
numerically. 



B. Algorithm 

We can now describe the relaxed BP algorithm. The algo- 
rithm produces a sequence of estimates Xj(t), t = 0, 1, . . . for 
each variable well as estimates % (t) for each transformed 
variable z^. Several other intermediate estimates and variances 
are also computed. The steps are as follows: 

1) Initialization: Set t = 1, and for every input node index 
j and every E E, initialize 



(t) 



x 3 {t) 



*^init ) 
Minit' 



(17a) 
(17b) 



where x- ln it and nf nit are the mean and variance of the 
prior px(x). 



2) Output node, linear step: For every <E E compute 



%-+j{t) = ^2$>i r Xi<-r(t), (18a) 
= J2\<Z> ir \ 2 ^ r (t). (18b) 

Also compute %(t) and /uf(i) similarly, but with the 
summation over all r € {1, . . . , n}. 

3) Output node, non-linear step: For every G E 
compute 

st. -ft) - ^fo'W*). /£+,-(*)) 19 , 
Ui^j{t — — — - - — — — j — — , (tyaj 

D2(yi,Zi^. j (t),(j,f^ j {t)) 

where D r (y, z, (ti) are the derivatives of the negative log 
likelihood function in (TTSJ. 

4) not/e, linear step: For every (i, j) £ J5 compute 



E 



(20b) 



Also, compute (fo*(t) and A<j(£) similarly, but with the 
summation over all I £ {1, . . . , to}. 
5) Input node, non-linear step: For every £ E 

compute 

xu-jit+1) = F in (q u - J (t),nl_ j (j;)), ( 21a ) 
/£_,(*+!) = £ h M^(t),^(t)). (21b) 

Similarly, for every j — 1, . . . ,n, compute Xj (t+1) and 
fij(t+l) using %{t) and /uj(t). Set t = t + 1 and return 
to step 2. 



C. Heuristic Justification 

Although we will formally analyze the relaxed BP algorithm 
below, it is useful to first provide a heuristic understanding 
of the steps. The relaxed BP algorithm is a simplification of 
the standard BP method where only the means and variances 
of the probability distributions are passed. Specifically, the 
terms Xi^j(t) and fxf^At) are approximations of the mean 



13 i in the 



lzed based 



and variance of the distribution p^^_j{t,Xj) in 
standard BP algorithm. In step 1, these are initia 
on the prior px(xj). The relaxed BP approximation does not 
assume that p^j(t,Xj) itself is Gaussian. However, relaxed 
BP does assume that there is a sufficient number of terms in 
the summation in (|9j) that p^j(t, Zj_>.j) of the standard BP al- 
gorithm is well-approximated as Gaussian. The terms (t) 
and juf_j,j(t) in ( 18 1 in step 2 of the relaxed BP algorithm are 
the mean and variance of this Gaussian distribution. Under 
this Gaussian assumption, the likelihood function (t, Ui) 



in ( 1 1 1 is approximately given by 



Pi- 



j (t, Ui) ps Py\z,fi (Vil^i-H (*)> VUj (*))> 



(22) 
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where p Y \z uivl^i I 1 ) ' s gi yen i n < 15 I. Then, using the deriva- 
tives in 
by 



T6l, the second order approximation of ( 22 1 is given 



log pt+At,Ui 



1 



*(*)' 



+ 0(\u l \ 3 ) + const, 



(t)\ 2 



(23) 



where Ui^.j{i) and ix\^At) are given in (19i in step 3 of the 



relaxed BP algorithm and the constant termaoes not depend on 
iti. Now summing the log likelihoods in (23 1, the distribution 

Pi^j(t,Xj) in (13 1 is given by 

logpf<_,-(i, xj) = const + logp x (xj) 



E 1 



q i ^ j (t)\ 2 + const, (24) 



where qi^j(t) and fi^At) are the outputs (20i. The ap- 



proximation in ( |24[ > is due to the fact that the sum of the 
0(\$>ijXj\ 3 ) terms is asymptotically negligible for large n. 
This implies that 



Pi^At,Xj) acp x (xj)<l>(xj 



where again (/)(■ ; •) is the Gaussian distribution in pj. This 
implies that, conditional on Xj, q^jif) is distributed as 
Af(xj, Mi^-j ;(*))• The final step, step 5, uses the scalar es- 
timation functions in Section IV-A to compute Xi^j(t) and 
fif^j(t), the mean and variance of Xj given %j_j(t). 

D. Algorithm Complexity and Simplifications 

We next consider the complexity of the relaxed BP algo- 
rithm. The most computationally demanding steps are the non- 
linear mean and variance computations in F- m (-), £;„(•) in 
and the derivatives, D r (-) of the log likelihood function 



(21 



in ([19 1. Each of these functions can be computed by a one- 
dimensional numerical integral. Moreover, each iteration of 
the relaxed BP algorithm requires exactly one evaluation of 
the input node functions and one evaluation of the output log 
likelihood derivatives per edge of the Tanner graph. Thus, the 
computations grow linearly with the density of the graph and 
unlike standard BP, the relaxed BP algorithm is tractable even 
for dense matrices 

The relaxed BP algorithm can actually be further sim- 
plified with some small approximations. Following Tanaka 
and Okada's approximate BP algorithm in JTSJ, the relaxed 
BP algorithm can be approximately implemented using one 
evaluation of the input and output nonlinear functions per 
vertex, as opposed to one evaluation per edge. 

To implement this simplified relaxed BP algorithm, we first 
assume that the outgoing variances are the same to all desti- 
nations. Specifically, we replace the variance computations in 
the relaxed BP algorithm with 



Mf«-i(*) 



1 



mi 



where /Lt| (t) and ^ (t) are still computed as in the relaxed BP 
algorithm. In this way, the variance function £m(-) and second 
derivative D 2 (y, z, p) are each only computed once per vertex 
per iteration, as opposed to once per edge. 

To reduce the evaluations of the input node function F- m (-), 



we first observe from ( 20 » and the definition of Qj(t) that 



/'•;■ 



»(*) 



Therefore, we can approximate the update in pTj ) with 

' /(*) =-Fin@<-jV 

F in (^(t),/ij(t)) 



^14— J \ 



MP^jit)) 



,(25) 



Q r =?3 (*) 



where we have used a first order approximation for F- m (-). 
Moreover, the partial derivative can be evaluated with the 
following lemma. 



Lemma 1: The input MSE function defined in Section IV-A 
satisfies 

d 1 
— F in (q,p) = -E in (q,p). 
dq [i 

Proof: See Appendix [C] 
For the output node, we can use the fact that 



(26) 



Zi-tj(t) = %{t) - $ijXi<-j(t). 

Therefore, we can make the approximation 



;(*) = 



D 1 (y i ,z i ^ j (t),pl(t)) 

£>i(lK. *<(*). 

. j (t)D 2 (y i ,z i (t),pl(t)). (27) 



<&ijXi 



Using ( |25| l and ( |27] >, we only need to evaluate the nonlinear 
functions F- m (-) and Di(-) once per edge. The analysis that we 
will present later does not apply to the relaxed BP algorithm 
with these approximations. Nevertheless, we will see in the 
simulations that the approximate relaxed BP algorithms behave 
closely to the exact implementation. 

V. Large Sparse Limit Analysis 

A. Modeling Assumptions 

We analyze the relaxed BP algorithm in the large sparse 
limit developed in p2)-fl4). The large sparse limit model 
considers a sequence of problems parameterized by n and d. 
For each n and d, the transform matrix $ = <&(«, d) £ R mxrl 
is of the form 

$=^AS 1 / 2 , S = diag( Sl ,...,s„), (28) 
V d 

where A e fljmx™ an d s € R nxn are two matrices, and 
to = m(n) is a deterministic function of n. The number of 
measurements is assumed to grow linearly in n in that 

n 



lim 

n->oo myn) 







(29) 



6 



RELAXED BELIEF PROPAGATION 



for some /? > 0. 



The components s, in (28 i are called the scale factors and 



are assumed to be i.i.d. with some probability distribution 
Ps{sj) that does not depend on n or d. We assume sj > 
almost surely. The diagonal scale factor matrix S is used to 
scale the powers of the components of x. Specifically, multi- 
plication by S 1 / 2 scales the variance of the jth component of 
x by a factor s,-. In this way, the scale factors can be used to 
capture variations in the power of the components of x that 
are known a priori at the estimator. Variations in the power 
of x that are not known to the estimator should be captured 
in the distribution of x. 

The matrix A is deterministic, and we evaluate the perfor- 
mance of the relaxed BP algorithm on a deterministic sequence 
of input and output indices i — i(n,d) and j = j(n,d). The 
sequence of matrices A and indices i and j are assumed to 
satisfy the following conditions: 

Assumption 1: Let A = A(n, d) be a sequence of deter- 
ministic matrices in the factorization of $ = Q(n,d) in (28 1. 
Let i = i(n,d) and j — j(n,d) be a deterministic sequence 
of indices. Let t > be some fixed iteration number of the 
relaxed BP algorithm. Assume that A, i and j satisfy the 
following: 

(a) For every n and d, is an edge in the Tanner graph 
G associated with 

(b) The computation subgraphs Gi(t) and Gj(t) of the 
Tanner graph taken a depth of 2t hops from the output 
node i and input node j are trees. Precise definitions of 
these computation subgraphs are given in Appendix [D] 

(c) All the nodes in the subgraphs Gi(t) and Gj(t) have 
degrees bounded above by d. 

For all output nodes i in the subgraph Gi(t), we have 
the limits 

1 



(d) 



lim lim , 

d->oo n->oo d 



E 



\at r 



= (30a) 



lim lim 

d— >oo n-toc 



1 

#72 



r£N aut (e) 

E 

r£N out (i) 



a.f r 



= 0. 



(30b) 



For all input nodes r in the subgraph Gj(t), we have the 
limits 

1 



lim lim 

d— too n— yoo d 
1 



E 

£6iVi n (r) 



\atr\ 



1, 



(30c) 



lim lim — "-r- \^ 

d->oo ii->txi a 1 



a£ r \ 



0. (30d) 



As in 1 12|-[14j, the key assumption is that the Tanner graph 
G associated with the transform matrix $ is locally tree-like 
around the components of interest i and j. The assumption 
is common in the study of BP algorithms as it makes the 
messages independent. This local tree-like property is only 
possible with the graph being sparse. This sparsity assumption 
is brought out explicitly by bounding the input and output 
degrees of the Tanner graph. 

Assumption [T] uses a deterministic model for the A as 
opposed to the random matrix model with i.i.d. components 



of the proofs. In particular, the input and output degrees are 
deterministically bounded as opposed to be bounded on aver- 
age - which simplifies some of the convergence arguments. 

In the large sparse limit analysis, we first let n ,— > oo with 
m growing linearly with n and keeping d fixed. This enables 
the local-tree like properties. We then let d — > oo, which will 
enable the use of a Central Limit Theorem approximation. 

This order of limits is critical. Unfortunately, to analyze 
dense matrices, one would like an analysis where d can grow 
with n. Indeed, if the matrix is completely dense, we would 
like d — m(n). Unfortunately, the large sparse limit analysis 
that we rely on here requires that we consider the two limits 
separately; it thus represents an approximation to the actual 
problem. Nevertheless, we will see in simulations that large 
sparse limit analysis appears to predict the behavior with dense 
matrices as well. 

More sophisticated analysis techniques developed recently 
in p2) enable the study of dense matrices without the order 
of limits above. One possible avenue of future research would 
be to see if that analysis can be applied to the relaxed BP 
algorithm with general (non-AWGN) output channels as well. 

B. Large Sparse Limit Convergence 

Under the large sparse limit model, define the random 
vectors 



9f^(n,d,t) 
8j(n,d, t) 
9U,(n,d, t) 



'■(n,d,t) = 



(xj , sj , x i<r - j (t) , (j,?^ (t)) (31a) 

{xj^j^jit),^^)) (31b) 

(^.^(O./iJUiC*)) (31c) 

{zi,Zi{t),p4{t)), (3 Id) 



where the dependence on n and d on the right-hand side of the 
equations is implicit. Our goal is to describe the large sparse 
limit behavior of these random vectors. 

A key result of [14| is that the large sparse limit behavior 
of BP is described by a set of simple state evolution (SE) 
equations, which can be described as follows: Given & ln (q,p) 
in Section [IV-A[ define 

£m(v,s) = E[£ in (q,fi/s)\s] (32a) 
£ in (p) = E[s£ in (<z,M/s)], (32b) 

where the expectation is taken over the scalar random variables 
s ~ Ps{ s ) an d Q given by ( 14 1 with x ~ px(x). We will call 
£j n (^) the input node MSE function. In addition to the works 
p3| , p4) , this function appeared in Guo and Verdu's replica 
analysis of MSE estimation [16] and related works |28|, [29|. 
Variants also appear in the analysis of the AMP algorithm 

CD. (ID- 

At the output node, let 



/4it = /3E(sK ni 



(33) 



studied in [12|-[14]. The deterministic model simplifies some 



where /if nit is variance of Xj according to the prior px(xj), 
and the expectation is over s ~ ps( s )- Then, for p < /if nit , 
Guo and Wang [14] define the output node MSE function as 

E [D 2 (y,z,fj,)] 
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where £>2(y,2, /j,) is the derivative (16i of the score function. 
The expectation in (|34| is taken over 



(35) 



{z,z)~tf(p,P„(ji)), 
where P z {n) is the covariance matrix 



P (u) = [ ^init ^init A* 



(36) 



and the conditional distribution of y given z is given by 

py\z(v\z). 



Now consider the recursion 



(J, x (t + l,s) = £ ta (jJ?(t),8), 



(37a) 
(37b) 
(37c) 



defined for t > 1. We can also write (37i with the single 
equation 

fi z (t + l) = f3£ in \£ out (p z (t))]. (38) 

In p4) , the equations p7j ) (or the single equation version ( |3~8"| l) 
are called the state evolution equations for BP as they describe 
the evolution of the error variances. 

We consider two possible initial conditions for this recur- 
sion: one low value and one high value. The low sequence will 
be initialized with ji z (l) = A*f Q (l) = 0, and the high sequence 



will be initialized with fJ, z (l) = Mhi(l) = A'fnit m '33 1. We 



will use the subscripts as in (J>f (t) and /x£j(t) to differentiate 
between the two sequences. 

Now, for t E Z + , let 9 x (t) be the random vector 



"■(t) = (x,s,F in (q,fj,),£ in (q,fj,)) , 



(39) 



where x ~ px(x), s ~ Ps( s )> q is distributed as (14i, and 
/i = n q (t — l)/s with [i q (t — 1) being the (deterministic) 



quantity in the state evolution (SE) equation (37 1. To initialize, 
let 



9 x (l) = (£,S,Z lnlt ,/4it): 



(40) 



where x ln it and (j,f nit are the mean and variance of the prior of 
px(x). We will see below that when we use n q (t) — /ujL(i), 
the SE output with the "high" initial condition 9 X (t) describes 
the limit of the random vector 9f^_j (n, d, t) in (31 1. When 
fi q (t) = /if (t), 9 x (t) is the limit of a related quantity for a 
certain "genie-aided" algorithm (see Appendix |B]>. 
Also, for all t g Z + , define the random vector 



(41) 



where /i z (<) is the output of the state evolution equations (37i 
and («,J)~jV(0,P,(Ai'(t))). 

Theorem 1: Consider the relaxed BP algorithm under the 
large sparse limit model above with transform matrix $ and 
indices i and j satisfying Assumption[T]for some^kea! iteration 
number t. Then: 



(a) The random vectors in ( [31} converge in distribution as 
follows: 

lim lim ef,Jn,d,t) = 9 x {t) (42a) 
lim lim 9 x (n,d,t) = 9 x {t) (42b) 

lim lim 6f.An,d,t) = 6 z (t) (42c) 
lim lim 9 z (n,d,t) = 8 z (t), (42d) 

d— >oo n— >oo 

where the random vectors 9 X (t) and 9 Z (t) are defined as 
above with n q (t) = fj^(t) and fi z (t) = fj,^(i). 

(b) The error variances satisfy the limits 

lim lim E \\xj - Xj(t)\ 2 \sj = s] = fi^it, s), (43a) 

d— ¥oo n— >oo 



lim lim E[|zi-£-(t)| 2 ] 



(43b) 



where s) and Mhi(*) are me output of the SE 

equations (|37| with the "hi" initial condition, 
(c) The minimum conditional error variance of Xj and Zi 
given $ and y satisfy the asymptotic lower bounds 



lim lim E [var(a; ? |y, $)|s 

d— too n— >oo 

lim lim E[var(z 4 |y,$)] > ^(t), 

d—too n— >oo 



s]>fif Q (t,s), (44a) 
(44b) 



where /if (t,s) and nf a (t) are the output of the SE 

equations (|37| with the "lo" initial condition. 

Proof: See Appendices [E] and |F| ■ 

The performance bounds in parts (a) and (b) are largely 
identical to the results in fT4) except that they apply to 
relaxed BP instead of BP. This is our main result: in the 
large sparse limit model, relaxed BP and standard BP have 
the identical asymptotic behavior. The lower bound in part (c) 
of the theorem is also very close to results in [14| and just 
repeated here for completeness. 

Part (a) of the theorem provides a simple scalar char- 
acterization for this asymptotic behavior. Specifically, using 
the definition of 9 X (t) in ( 39 1, Theorem [T] shows that the 
componentwise behavior of the relaxed BP follows a scalar 
equivalent model as shown in Fig. [3] The component Xj is 
first corrupted by Gaussian noise yielding a noisy component 
qj. The relaxed BP estimate Xj(t) then behaves identically 
to the optimal scalar MMSE estimate of xj from the AWGN 
measurement qj. From this scalar equivalent joint distribution 
of the components and their estimates, one can compute any 
componentwise separable performance metric such as mean- 
squared error or probability of detection. 

The effective Gaussian noise levels in the scalar models 
are described by and fxiAt) from the state evolu- 

tion equations (37 1. Since the state evolution equations can 
be evaluated easily with numerical integration, Theorem [T] 
thus provides a simple, computationally-tractable method for 
exactly characterizing the performance of the relaxed BP 
algorithm. 

Part (b) shows that the SE outputs s) and 

respectively describe the asymptotic estimation error on the 
components Xj and prediction error on the outputs 2j. Part (c) 
provides corresponding lower bounds on these error variances 
for any estimator. 
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Fig. 3. Equivalent scalar model of the relaxed BP algorithm. The asymptotic 
behavior of the relaxed BP estimate Xj (t) of a component xj is identical to 
the output of an MMSE estimator with AWGN noise. The effective noise is 
scaled by the factor Sj corresponding to the component Xj. 



C. Convergence over Iteration and Mean-Square Optimality 

Theorem [T] describes the asymptotic behavior of the relaxed 
BP algorithm for a fixed iteration number t. Our second result 
describes the behavior of the relaxed BP estimates as t — > oo. 



Theorem 2: Consider the state evolution equations (37i. 
Suppose that £i n (fi) and S ou t(p) are continuous. Then, the SE 
equations have at least one fixed point with < fi z < /Lt? nit . 
Also: 

(a) With the "hi" initial condition, p z (l) = /^(l) = Atfnif 
the sequences Mhi(*)> A*hi(*' s ) anc ^ Mhi(*) decrease mono- 
tonically to the largest fixed point of the SE equations 
1(37). 

(b) With the "lo" initial condition, p, z (l) = /xf (l) = 0, the 
sequences p,f (t), pf a (t,s) and nf (t) increase monotoni- 
cally to the smallest fixed point of the SE equations (37 1. 



Proof: See Appendix O 



The theorem is similar to the convergence result in (13| 
except that it applies to all /?. The importance of the result 
is that it shows that relaxed BP provably converges in the 
limit of large iterations, and the asymptotic error variance of 
relaxed BP and the corresponding error lower bounds are both 
fixed points of the SE equations. A corollary of this result 



is that, when the fixed points of the SE equations ( 37 1 are 



unique, the error variance of relaxed BP and the corresponding 
lower bound agree. The result thus gives an easily verifiable 
condition under which relaxed BP is asymptotically mean- 
square optimal. 



VI. Numerical Simulations 

The large sparse limit analysis of the relaxed BP algorithm 
in Section [V] is theoretically exact only in the asymptotic 
limit of large dimensions. Also, the analysis assumes a certain 
scaling where the measurement matrix $ remains sparse. To 
test the accuracy of the large sparse limit model for finite 
problems with dense <1>, we conducted the following simple 
numerical experiments. 



A. Gauss-Bernoulli Prior with an AWGN Output Channel 

In the first experiment, the vector x was generated with i.i.d. 
components Xj with a Gauss-Bernoulli distribution given by 



M(o, i/p) 





with prob = p, 
with prob = 1 — p. 



(45) 



Here p is the sparsity ratio and represents the average fraction 
of non-zero components in x. The experiments below used the 
value p — 0.1. We chose this Gaussian mixture model since 
it is a simple example of a sparse prior used in compressed 
sensing. This prior is also used in the numerical validation of 
the replica method in (28). 

The components of the measurement matrix <1> were gener- 
ated as i.i.d. zero-mean Gaussians. Even though this matrix is 
dense, we will see that the large sparse limit analysis predicts 
the behavior of the relaxed BP estimator well. 

For the measurement channel in this first experiment, we 
assumed an AWGN output channel Q, where the noise w also 
has i.i.d. zero-mean Gaussian components. The noise variance, 
fi w , of the components of w was selected such that SNRrj = 
10 dB, where SNRq is the signal-to-noise ratio, 



SNR = 10 log- 



in 



E||<N| 2 
np w 



(46) 



As discussed in |28), SNR is the effective SNR that an 
estimator would see in estimating any one component of xj 
with the other n — 1 components of x known. 

Fig. |4] shows the median normalized squared-error (NSE) as 
a function of the iteration number in the relaxed BP algorithm 
for this model. In all the numerical experiments, we used 
the relaxed BP algorithm with the simplifications described 
in Section IIV-DI The simulation was conducted with 1000 
random realizations of the problem, and for each realization, 
we measured the NSE given by 



NSE = 10 log 



10 



Ex 



where x is the estimate of x. The NSE represents the average 
error over the n components of x. Fig. |4] plots the median of 
these NSE values over the 1000 Monte Carlo trials. The figure 
shows the median NSE for vector dimensions n = 100 and 
500 and j3 — n/m — 2 and 3. 

The points marked "pred (SE)" are the NSE values as 



predicted by the state evolution equations (37 1. We see that 



the state evolution equations, which are theoretically exact for 
infinite n, provide an excellent match (within 0.1 dB) with 
the simulated values when n = 500. At the shorter length 
of n — 100, the SE equations still provide a good match, 
although there is a small steady error of about 0.2 dB when 
/3 = 2 and 0.8 dB when /? = 3. 

In Fig. [4] we plotted the median NSE since there is 
actually considerable variation in the NSE over the random 
realizations of the problem parameters. To illustrate the degree 
of variability, Fig.|5]shows the CDF of the NSE values over the 
1000 Monte Carlo trials. We see that there is a large variation 
in the NSE, especially at the smaller dimension n = 100. 
This means that although the median performance may be 
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Fig. 4. State evolution prediction. The normalized squared-error as predicted 
by the state evolution equations {37) as a function of the iteration number is 
compared against the simulated value for n = 100 and 500. The simulation 
is based on a sparse Gauss-Bernoulli model. See text for details. 
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Fig. 5. Performance variation. Plotted is the simulated CDF of the NSE, 
averaged over the components in the vector x, but accounting for variations 
in the random problem parameters. The CDF for the value of n = 500 shows 
less variation than n = 100 and closer to concentration around the state 
evolution limit (SE lim). 



good, there is still a significant chance that the algorithm could 
perform well below the median on any particular realization. 

As one might expect, at the higher dimension of n — 500, 
the level of variability is reduced and performance begins to 
concentrate around the density evolution limit. However, even 
at n — 500, the variation is not insignificant. As a result, 
caution should be exercised in using the SE predictions with 
short to medium block lengths. 
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Fig. 6. Performance comparison of relaxed BP with other sparse estimation 
methods. 



B. Comparison to Other Sparse Detection Methods 

Fig. [6] plots the squared-error performance of the relaxed 
BP algorithm varying the measurement ratio j3 = n/m and 
holding n = 100. For each value of f3, the points labeled 
"relaxed BP" show the median NSE after 20 iterations of the 
relaxed BP algorithm. 

Also plotted is the theoretical optimal MMSE performance 
as predicted by the lower bound, Theorem [TJc). I n this 
experiment, we observed only one fixed point for the SE 
equations for all the values of (3. So the lower bound on 
the MSE in Theorem [TJc) equals the theoretical asymptotic 
performance of relaxed BP given in Theorem [TJb). We see 
that the relaxed BP algorithm at n = 100 performs very close 
to the asymptotic optimal performance for values of j3 up to 
approximately 2. For larger values of j3, there is a small gap 
between the performance of the relaxed BP algorithm and the 
optimal performance. The gap grows to 0.8 dB at (3 = 3. As 
discussed in Fig. |4] this gap decreases at higher values of n. 

Fig. [6] also shows the performance of two other simple 
algorithms. The top curve is the median NSE for optimal linear 
MMSE estimation, and the curve labeled "lasso" is the MSE 
from the lasso algorithm of J30| . The lasso method is based on 
an l\ -relaxation of the optimal estimator and is widely-used 
for sparse estimation problems in compressed sensing. In this 
experiment, the regularization weighting in the lasso estimator 
was optimized as described in J28) . 

We see that the relaxed BP algorithm offers some gain 
over either of these methods. Of course, with the interest 
in compressed sensing, there is now a plethora of methods 
for estimating sparse vectors. It is likely that other methods, 
including possible modifications of lasso, can obtain a similar 
performance as relaxed BP. A complete comparison of relaxed 
BP against these methods is beyond the scope of this work. 
What is important is that relaxed BP provides a unified, 
systematic method for a large class of problems, such that 
when applied to certain compressed sensing problems, it gives 
near optimal performance. 
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Fig. 7. Relaxed BP algorithm with a Gaussian prior and bounded noise 
output channel. The plot compares the simulated relaxed BP performance 
against the predicted performance based on density evolution. Also shown is 
the performance of the linear MMSE estimator with and without projection 
to the consistent set. 



C. Estimation with Bounded Noise 

To validate the relaxed BP method and analysis for non- 
AWGN output channels, we next considered a bounded, 
uniform noise channel. Specifically, we assumed that the 
output channel is given by ([3]) where the components of 
the noise vector w are i.i.d. and uniformly distributed in an 
interval [—6, S] for some 5 > 0. Among other applications, 
this bounded noise model arises in the study of subtractive 
dithered quantization pT) , p2| , where the uncertainty interval 
corresponds to a quantization region. 

Unfortunately, optimal MMSE estimation with bounded 
uniform noise involves an integration over an n-dimensional 
polytope, which is generally computationally intractable. How- 
ever, the relaxed BP algorithm can be readily applied to the 
relaxed BP problem with bounded noise providing a simple, 
computationally-tractable algorithm for this problem. 

Fig. [7] shows a simulation of the relaxed BP algorithm 
with a bounded uniform output noise channel. The simulation 
used a vector x with n = 50 zero-mean i.i.d. Gaussian 
components. Similar to the previous experiment, we again 
used a measurement matrix $ with Gaussian i.i.d. components. 
Also, bounded uniform noise in the interval [—6, 6] results in 
a noise variance of fi w = <5 2 /3. In this experiment, the noise 



level S was adjusted such that SNRo in ( 46 1 was equal to 10 
dB. We varied the values of the measurement ratio (5 — n/m, 
and for each value of /3, the points labeled "Relaxed BP" in 
Fig.|7]plots the median NSE over 1000 Monte Carlo trials of 
the relaxed BP algorithm, using 20 iterations in each relaxed 
BP run. 

As in the previous experiment, the SE equations have 
a unique fixed point, and thus relaxed BP is theoretically 
asymptotically optimal with a minimum variance predicted by 
the SE fixed point. The curve labeled "Opt MMSE" shows this 



theoretical asymptotic minimum squared error. We see that the 
median squared error of relaxed BP at n = 50 matches the 
theoretical asymptotic performance well. 

Fig. [7] also compares the relaxed BP method to two other 
simple algorithms. One is the linear MMSE estimator, which 
is equivalent to the MMSE estimator assuming Gaussian noise. 
The second estimator, shown in the curve labeled "Linear 
MMSE + Proj," is the linear MMSE estimate followed by 
a projection step. A key observation of the work [33) , p4) is 
that any estimate (including the linear MMSE estimate) can 
be improved by simply projecting the estimate onto the set of 
vectors x consistent with the bounded noise. An estimate x 
is consistent with the noise if ||y — $x||oo < 6. The works 
(33j, (34j show that projecting to a consistent estimate always 
reduces the squared-error and can offer significant gains at 
low values of /? (what is called high oversampling). Similar 
results and algorithms have been reported elsewhere p5)-p7). 
The figure shows that projecting the linear MMSE estimate 
does indeed offer reductions in the squared error, especially for 
small j3. However, the relaxed BP algorithm, in comparison, 
is even better. 

The reason that the relaxed BP algorithm shows a perfor- 
mance improvement over the projected linear MMSE estimate 
is that projecting the linear MMSE estimate will generally 
result in a point only on the boundary of the consistent set. In 
contrast, the relaxed BP algorithm will attempt to find the 
centroid of the consistent region, which will likely have a 
smaller error variance. 

VII. Conclusions 

We have presented an extension to Guo and Wang's relaxed 
BP method in [13] to non-AWGN measurements. The algo- 
rithm applies to a large class of estimation problems involving 
linear mixing and arbitrary separable input and output distri- 
butions. Unlike standard BP, relaxed BP is computationally 
tractable even for dense measurement matrices. Our main 
result shows that, in the large sparse limit, relaxed BP achieves 
the same asymptotic behavior as standard BP as described in 
1 14 1 . In particular, when certain state evolution equations have 
unique fixed points, relaxed BP is mean-square optimal. Given 
the generality of the algorithm, its computational simplicity 
and provable performance guarantees, we believe that relaxed 
BP can have wide ranging applications. We have demonstrated 
the algorithm in two well-known NP-hard problems: com- 
pressed sensing and estimation with bounded noise. 

The main theoretical limitation of the work is that it applies 
to large sparse random matrices, where the density of the 
measurement matrix must grow at a much slower rate than the 
matrix dimension. An interesting avenue of future work would 
be to see if the dense matrix analysis of the AMP algorithm 
in (TTJ and (22| can be extended to relaxed BP. 

Appendix A 
Preliminary Convergence Results 

Before proving our main result, the next number of ap- 
pendices develop some preliminary results. We begin in this 
appendix with some simple extensions to the Law of Large 
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Numbers and the Central Limit Theorem. We omit the proofs 
as the results can be proven with minor modifications to 
standard arguments using characteristic functions |38) . 

Lemma 2 (Modified Law of Large Numbers): For each n 
and d, let x^ i £ K, i = 1, . . . , d be a set of independent 
(though not necessarily identically distributed) random vari- 
ables satisfying 



lim lim : 

d— too n— ^oo 



X ~p x (x), 



where the convergence is in distribution and px(x) is the 
distribution for the limiting random variable x and i = 
i(n,d) £ {1, ...,d} is any deterministic sequence. Assume 
x has bounded second moments. Let 6^ i be a set of non- 
negative deterministic constants such that 



1 d 

lim lim - V^j = 1. 

d— >oo n— >oo et ' 



Then, the limit 



lim lim i ^ ja £ . = E(x) 

holds in distribution. 

Lemma 3 (Modified Central Limit Theorem): Let x^ i be 
as in Lemma [2] such that, for any deterministic sequence of 
indices i = i(n, d) £ {1, . . . , d}, we have the limit 

lim lim y/d\E(xi 2 ) -E(x)| = 0. 

(Z— >-oo n— ^oo ' 

Also, suppose that a„ 4 is a deterministic sequence of scalars 
such that that 



1 d 

lim lim -rV|a^ 



d—too n— ¥oo a ■ 

i=l 

1 d 

J™ Hm ^3?2 l a l'! 3 = - 

d— >oo n— >oo Orl — ' 

»=1 



Then, 



lim lim i J2 <i«i - E (- T )) = ^(0, var(x)) 

V 2=1 

where the limit is in distribution. 



Appendix B 
Genie Algorithm 

As stated earlier, part (c) of Theorem [T] is not new and 
can be found in ]T3) , p4) . Their proof is restated here 
only for completeness. Using similar arguments as [39], their 
proof considers a "genie" or "oracle-aided" algorithm that 
has, as side information, knowledge of certain subsets of the 
components Xj. 

The genie algorithm is defined as follows: In step 1 of the 
relaxed BP algorithm in Section |IV-B| we simply replace the 
initialization (jTVj at t — 1 with 



x 



/4Uj(*) 



(47a) 
(47b) 



where Xj is the true value of the component. Otherwise, all 
the steps of the algorithm are the same as the regular relaxed 
BP algorithm. 

We will see that while the error in the regular BP algorithm 
improves with each iteration, the genie algorithm starts with 
zero error and then increases. The performance of the true op- 
timal estimator is "sandwiched" somewhere between the genie 
and regular relaxed BP algorithms, and thus consideration of 
the regular and relaxed algorithms provide upper and lower 
bounds on the optimal performance. 

Appendix C 
Proof of LemmaQ] 

Fix n > and for r = 0, 1, 2, . . ., define the functions 



(48) 



A r (q) = / x r px(x)4>(q - x; p) dx. 



Then, the conditional mean F in (q,p) and variance £ in (g,/z) 
are given by 



£in(<7,M) = 



A 2 {q) Al(q) 



(49a) 
(49b) 



Mi) Ai( q y 

Now, taking the derivative of the Gaussian distribution <f>(- ; p) 
in Q, it is easily verified that 

jr<t>(Q - x ; m) = - — -4>(q -x; n). 

oq pt 



Bringing this derivative inside the integral ( |48[ ) then shows that 

dA r {q) 1 



(A r+l (q) - qA r (q)) 



(50) 



dq (i 
Applying ( |50| to ( j49| ) we obtain 

dFin(q,iA) = d A 1 (q) 
dq dqA (q) 

(A 2 ( q ) - q A 1 { q ))A Q {i) - iMd) - qMq))Mq) 



Mo(<?) 



A 2 (q) Al(q) _ 1 



= -£in(g,M)- 



Mo(<?) Mo (q) l l 

Appendix D 

Computation Subgraphs and Local Tree-like 
Properties 

An essential assumption of the large sparse limit analysis 
is that the Tanner graph is locally tree-like. To describe this 
property more precisely, we review some standard definitions 
and results that can be found in any description of BP such 
as J40) . Consider the Tanner graph G for the linear mixing 
problem defined in Section III For t = 0, 1, 2, . . ., recursively 



define a sequence of computation subgraphs, Gj(t), Gi(t), 
Git-j(t) and Gi^j(t), as follows: 

1) Initialize: Set t = 1, and for all £ E let Gn^j(t) 
and Gj(t) be the empty subgraphs. 

2) Output update: For all £ E, let Gj_^-(t) be the 
subgraph containing, for all r £ N out (i) ^ j: 

(a) the edges (i, r); and 
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(b) the subgraphs Gi^ r (t). 

Similarly, define the subgraph Gi(t) to be the subgraph 
using all r £ N out (i). 
3) Input update: For all G E, let Gi^j(t + 1) be 

the subgraph containing the node Xj and for all I g 

(a) the edges (£,j); 

(b) the output nodes yf, and 

(c) the subgraphs G^j{t). 

Similarly, define the subgraph Gj(t) to be the subgraph 
using all I € Ni n (j). Set t = t + 1 and return to step 1. 

Now let Hi-tjty) be the sigma algebra generated by 
the components yi contained in the computation subgraph 
Gi^j(t). Similarly, let Hi+-j(t) be the sigma algebra gen- 
erated by the components ye contained in the computation 
subgraph Gi^j(t). To analyze the BP algorithm with the "ge- 
nie" initialization in Appendix B let Hf™j C (t) and Hf^ e (t) 
be respectively the sigma algebras generated by the entire 
vector y and the variable x r not in the computation subgraphs 
@i-yj(t) an d Gi^j(t). The following results are standard for 
BP and can be proven using arguments as in pT) . 

Lemma 4: Consider the sigma algebras i?j^_j(t) 
Hi_).j(t) defined above, 
(a) 



A. Derivatives of the Score Function 

The following lemma characterizes the derivatives 



D r (y,z,/j,) in (16 1. The result can also be found in [14], but 
we sketch the proof here for completeness. We will use this 
result below in the analysis of the output node update. 

Lemma 7: Fix z and p, and consider random variables y 
and z generated by z ~ Af(z + u,p) and y ~ Py\z(u\ z ) f° r 
some net. Consider the derivative of the score function in 
Then, 

E[D-i(y,z,p,)\u\ = -uE[D 2 (y,z,n)\u = 0] 

+ 0{u 2 ), (51a) 
var [Di(y,z, it) | u] = E [D 2 (y, z, (i)\u — 0] 

+ 0{u 2 ). (51b) 

Proof: To simplify the notation, we will drop the depen- 
dence on z and \i. So, we will write, for example, D\{y) for 
Di{y,z,p). To prove ( 5 la i, we first note that 



and 



E[#i(y)M 



PY\u(y\u)D 1 (y)dy 



In the standard BP algorithm in Section III the distribu- 
tions pf^j(t,Xj) are Hi<-j(t) measurable, and the dis- 
tribution p*_^j{t,Zi) and likelihood function p^_^-{t,Uj) 
are H^At) measurable. 



PY\u{y\0)Di(y) dy 

d_ 

du 



- " / —PY\u{y\u) 



Dx{y) dy + 0(u 2 ). (52) 



(b) In the relaxed BP algorithm, x^j (t) and q^j (t) Now, for the first term in (|52jl, note that using the definition 

are Hi^jit) measurable and 2i_>j(i) and u^jif) are 

Hi^j(t) measurable. 
Similarly, under the oracle initialization described in Appendix 
[b] the above statements hold with Hi^j(t) and Hi^j(t) 



of Di(y) in (16 1, we have 

PY\u(y\0)Di(y)dy 



replaced by Hf^f{t) and Hf™ e (t). 

Lemma 5: Consider the standard BP algorithm in Section 
[Hi] and the computation subgraphs defined above. 

(a) If Gi^j(t) is a tree, then pf<_j(t,Xj) is the conditional 
distribution of Xj given Hi^j(t). 

(b) If Gi^j(t) is a tree, then p?^.j(t,Zi) is the conditional 
distribution of z^j given H^^t). 

Similarly, under the genie initialization, the above statements 
hold with Hi^j(t) and Hi^j(t) replaced by Hf"" c (t) and 



- / PY\u(y\fy Q^\ogp Y \u(y\u) 



dy 



d_ 

du 



py\u(v\u) 



dy 



0^ / PY\u{y\u)dy\ u=0 



(53) 



Lemma 6: Consider the relaxed BP algorithm in Section IV 
under the assumptions in Section [V] 

(a) If Gi<-j(t) is a tree, then all the terms (%_ > .j(t), iifL^i)) 
are independent for different values £ € N[ n (j) ^ i. 

(b) If Gi-yj(t) is a tree, then the random vectors 6f<_ r (t) are 
independent for different values r G N out (i) ^ j. 

Appendix E 
Proof of Theorem[TJa) 

We prove this by induction. It is clear that ( |42a| > holds for 
t = 1. In part B below we will show that if ( |42a[ ) holds for 
some t, then so does ( |42c| > and ( |42d| >. Then, in part C, we will 
show that if ( |42c| i holds for some t, then ( |42a| i holds for t + 1. 
This will complete the induction argument. Part A provides a 
preliminary calculation that we need in part C. 



Similarly, the second term in ( |52| > can be simplified by eval- 
uating the second-order derivative in the definition of D2(y) 
to obtain 



E[D 2 (y)\u = 0] 
1 

'I 



, ,,,, i o- Py\u(v\u)\ 



dy 



9 



fa2 PY\u(y\u)\ u=Q dy. 



(54) 



Now, 



d 2 . 
fa2 PY\u(y\u)\ u=0 dy 

du 2 



PY\u(y\u)\ u=0 dy = — (1) = 0. (55) 



RANGAN 



13 



Also, 



d 



PY\u{y\0) \du 



PY\u(y\u)\ u=0 ) dy 



We next consider the convergence of the variables z^j (t) 
If we define z^j as in Q, then ( [T8] > and (58 1 show that 

i-yjW 



d_ 

du 



d 

PY\u(y\u)\ u=Q ^ logpY\u(y\u)\ u=0 dy 



E 



d_ 



PY\u(y\u)\ u=0 D^y) dy. 



(56) 



1 

Vd 



E 



(I-; 



&i^—r ) 
-r). (61) 



Substituting (55i and (56i into (54i, we obtain that 

d 



E[D 2 (y)\u = 0] 



du 



PY\u(y\u)\ u=0 Di(y)dy. (57) 



Then (|51a[> follows by substituting <[53j and (|57) into (52 
Equation (51b i is proved by similar manipulations. I 



By Lemma |6|b) the terms in the summation ( |6T| > are indepen- 
dent. Also assuming ( |42a| i holds, the terms in the summation 
converge as 

lim lim -Js^(x r — x^ r ) — y/s(x — Fi n (q, — l)/s)), 

d— ¥oo n— >oo 

where the convergence is in distribution and the random 



B. Analysis of the Output Node Update 



variables x, s and q are the terms in 9 x {t) in (39 1. The 
variances of the terms converge as 

lim lim E [s r |x r — rE^,-! 2 ] 

d— >oo n— >ao 



Let t > 1 and suppose that d42ab holds for some t. We will - E [ s \ x ~ F ^ V q {t - 1)/«)| 2 ] - £m{(i 9 (t - !))• 



show that this induction hypothesis implies (42ci and (42di 



We will just prove this implication for t > 1. The proof for 
t = 1 is similar. 

Under the induction hypothesis ( |42a| i, we first consider the 
convergence of the terms /if_^ (i). From the factorization (28 1 
we have that 



Using ( 30a i, ( 30b i and d60b, we can apply the Modified Central 
Limit Theorem (Lemmaj3j) to ( |oT| to obtain 

lim ]im Zi^j -Zi^j(t) =Af(0,fi z (t)), (62) 

d— >co n—yoo 

where the convergence is in distribution. 
Similarly one can show that 



$ij = ^qa ijy /s~, Vj e N out (i). 



(58) 



lim lim z. 

d— >oo n— >oo 



AA(0, M f nit ) 



Using ( TT8| > and ( |58| ), we have that 



and 



E 

1 

d 



lim lim {z^j - %_>.,• (t))2j_>,-(t) = 0, 

d— ¥oo n— >oo 



(63a) 



(63b) 



E 

rGJV out (i)^i 



a ir | 2 s r ^ r (i). 



(59) 



where /if nit is defined in ( |33] l. Equations ( |62[ l and (63i imply 
that 

lim lim (z^,-, = (z,z), (64) 

d— i-co n— >oo 



By Lemma |6|b) and the assumption that Gi^jit) is a tree, 
the terms in the summation in (59 1 are independent. Also, 
the induction hypothesis ( |42a[ ) shows that their asymptotic 
distribution is given by 

lim lim s r p,? (t) — s£ m {q, fi q (t — l)/s), 

d— >oo n— >oo 

where the convergence is in distribution and the random 



where z and z are the Gaussian random variables in (35 1 with 
p, = fi z (t). Combining ( 60 » and (64i proves (42c i. 



This argument shows that if (42ai is true for some t, then 



variables s and q are the terms in 6 x (t) in (39 1. For the regular 
relaxed BP algorithm [i q (t — 1) = — 1) and, for the genie 
algorithm, 1) = 1). In either case, the expectation 

of this limiting random variable is 

E [s£ in (q, ^(t - l)/s)} = £ in (p, q (t - 1)), 



where £i n (fi q (t — 1)) is defined in (32i. Using (BOci, we can 



apply the Modified Law of Large Numbers (Lemma 
sum in 091 to obtain 



so is (42c i. A similar argument shows that (42ai also implies 
( |42d| >, except we replace the summations over the sets r € 
N ut(i)^j with r G N out (i). 

C. Analysis of the Input Node Update 

For the next step in the induction proof, we want to prove 
that if \A2c\ holds for some t, then ( |42a| > holds for t + 1. 

Throughout this section, fix the input index j and variables 
Sj and Xj. For each output index £ G N- ln (j) and meR, define 
the Markov chain 

zf -> zf{u) -> yf (it), 
where the random variables are distributed as 



2b, to the 



z e 



lim lim M ? (t) = P£ in (p q (t - 1)) = M *(t), (60) 

d— >oo n— >oo J 

where the convergence is in distribution and the last step 



zf(u) 

y?(u) 



JV(0,/4i t -/**(*)) 

PY\z{y\zf{u)). 



follows from the definition of fi z (t) in (37 i. 



Suppose the Markov chains are independent over different 
values of £. We use the superscript "G" here to indicate that 
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the random variables are Gaussian approximations to actual 
random variables in the problems. To be specific, first observe 
that 



4>/ 



1 



a t j 



We A out (j). 



Combining ( 66 1 with the definition of z^j in (|9| and the fact 
that z = $x, we have 



where 



z t = ^ ®li x i = u i + z i^j, 

r€N out (£) 



1 



The induction hypothesis ( |42c[ ) and Lemma |6|a) then show 
that 



lim lim j(t),z t ,yi, At)) 

d— >oo n— foo J 



(69) 



where the convergence is in distribution. 

With these definitions, we first consider the convergence of 
Hi^j{t). Using < j!9b} , (20b I and (66 1, we have that 



E 

= J E \a ej \ 2 s J D 2 (y e ,z l ,^ 1 (t), t iU 1 (t)).(70) 

By Lemma [6|a), given Xj and Sj, all the terms in ( f70] > are 
independent. 

Also, using (|69|, we have the limit 



lim lim D2{yt,Zi->j(t),fif_+At)) 



(a) 
(b) 
(c) 



D 2 (yf(0),zf(0),^(t))+O(\u e \ 3 ) 



^ D 2 (^(0),2f(0),//(t)) 



(71) 



where the convergence in (a) is in distribution; (b) follows 
from the assumption that Dg(-) is uniformly bounded and (c) 
follows from the fact that (68 1 shows that \ui\ 3 — 0(d~ 3 / 2 ) -» 



0. We can apply the Modified Law of Large Numbers (Lemma 
[2| to the sum ( f70| ) to obtain the limit 

lim lim 



== lim lim — 

(i->oo n-too a 



xD 2 (y«(Q),z?(0),n*(t)) 



(6) 



(c) 



Sj B[D 2 (yf(0),zf(0),^(t))} 
Sj £ ou An Z (t)) = 



(72) 



where the limit in (a) is in distribution and follows from ( |70| ) 
and ( |7T| ; (b) follows from ( |30c| i and the Modified Law of 
Large Numbers; (c) follows from the definition of £ ut(') 



is identical to (yf (0), zf (0)); and (d) follows from the SE 
equation (|37a[). 



We next turn to the distribution of qi<_j(t). Using (19 1, the 



(66) update ( 20a i can be simplified to 



Qii-j(t) 



(67) 



(68) lim lim (t) 



-^LiW E ^^(W. W*)> **?-*(*))■ (73) 



So, using d69} and d72 



d— >oo n— >oo 



/i«(t) 



lim Vs^Difof (u*),5? M,M 2 W), (74) 



where here and below we use the shorthand lim^n for 
limrf lim„ . Now define 

e t = Di(yf(y,i), if {u e ), fi z (t)) 

+ $t jXj E [D 2 {yf (0),2? (0),//(*))] , (75) 
so we can rewrite |74} as 
lim lim q^j(t) 

d— too n— >oo 



lim y^—^-et 

lim ^ A*«(*)l*«l a *J E [^(^(0)^(0),^^))] 
'Y]a} i e e +Xj, (76) 



M «(t) 



= — lim 



where the last step follows from ( 74 1 and (58 I. 
Now, applying Lemma [7] to eg in ( 75 I, 

E(e f K) = 0(M 2 ) 
var(e,M = E [D 2 (yf(0),zf(0), n z {t))] + 0(\ug\ 2 ) 

= ]k) + ° M) - 

From the definition of ug in |68| >, the 0(|m£| 2 ) terms are 
0(1/ d) and thus can be ignored. Applying the Modified 
Central Limit Theorem (Lemma [3]l to the sum in ( 76 1 along 
with (|30cb we obtain 



( U 1(t)) 2 ( 1 

lim lim $<_,-(*) = Xj + A/" I 0, 



A/" a:,-, 



M 9 W 



(77) 



Also since Xj ~ px(xj) and Sj ~ Ps(sj), (72i and (77i 
together show that for any iteration i, 

lim lim (xj , sj , <&<__,- (t) , /if , (t) ) 

a— J-oo n— ^-oo J 

= (x, s,Af (x, J/i?Wj , . (78) 

where the convergence is in distribution, x ~ (x) and s ~ 



in (34 1 and the fact that the expectation over (y, z) in (34i Ps(s). Applying (21 1 to (78 1 shows (42a I. Therefore, we have 
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shown that if ( j42c| ) holds for t, then ( |42a| > holds for t+1. One 
can also show that if ( |42c| > holds for i, then ( |42b| > holds for t+1 
using similar arguments except we replace the summations 
over I G iV in (j) ^ * with * G N-Jj). 

Appendix F 
Proof of Theorem[T];b) and (c) 

Parts (b) and (c) of Theorem [T] can be proven along the 
lines of Guo and Wang's analysis in fT3) and 1 14 1 using the 
concept of an asymptotically sufficient statistic along with 
a standard sandwiching argument. Specifically, using their 
analysis, we will show that the regular and "genie" versions of 
the relaxed BP algorithm provide sufficient statistics for certain 
conditional distribution of the vectors x and z. The regular BP 
algorithm provides a sufficient statistic relative to the sigma 
algebras Hi<_j(t) and Hi^j(t), and the "genie" relaxed BP 
algorithm in Appendix [B] provides a sufficient statistic relative 
to Hf^ e (t) and Hf™ c (t). Moreover, the MSE relative to the 
sigma algebras is described by the state evolution equations 
starting from the "high" initial conditions for the regular 
algorithm and "low" initial condition for the genie algorithm. 
Since the sigma algebra generated by the actual observation 
vector y lies somewhere between these sigma algebras, H and 
H gcmc , the MSE of the optimal estimator is "sandwiched" 
between the two solutions to the state evolution equations. 

Since the arguments in this section follow very closely with 
Guo and Wang's analysis in fT3] , fl4| , we will just sketch the 
proof. Similar sandwiching arguments can be found in the 
early analysis of LDPC codes in [ 39 1 . 

We begin with the following definition. 

Definition 1: Suppose that (x n ,q n ,H n ) is a sequence 
where, for every n, x n and q n are random variables and H n 
is a sigma algebra. We will say that q n is an asymptotically 
sufficient statistic for x n given H n with limiting distribution 
(x n ,q n ) -t (x,q) if: 

(a) q n is immeasurable; 

(b) (x n ,q n ) ~ > (x,q) in distribution; 

(c) For any bounded continuous function f(x), 

lim (E(/(ar„)|fr n ) - E(/(i)|g - q n )) = 

n— >oo 

almost surely. 

The definition is a natural generalization of the concept of 
a sufficient statistic. Specifically, it says that the conditional 
estimate E(f(x n )\H n ) can be replaced by ~E(f(x)\q = q n ) 
with asymptotically vanishing error. That is, it is sufficient to 
use just q n instead of the entire sigma algebra H n and use 
just the limiting distribution (x, q) as opposed to the termwise 
distributions (x n ,q n ). 

Following along the lines of Guo and Wang fl3) , p4) , we 
now prove the following. 

Theorem 3: For the relaxed BP algorithm: 
(a) If Gi-yj(t) is a tree, then %^j(t) is an asymptotically 
sufficient statistic for Zi-yj given Hi-yj(t) with the 



asymptotic distribution ( |42c[ ). 
(b) If Gij-jif) is a tree, then (q i4 _j{t), sj) is an asymptot- 
ically sufficient statistic for Xj given H^jit") with the 



The result also holds for the "genie" algorithm in Ap- 
pendix B with Hi-).j(t) and Hi<-j(t) replaced by Hf™ e (t) 
and H U '!h. 

Similar to the proof of Theorem [TJa), we prove Theorem [3] 
by induction. For the initial step in the induction, note that, 
for the regular (non-Genie) algorithm, Hi-yj(t) is empty 
and Zi-yj(l) is the prior on Zi-yj. For the genie algorithm, 
Hf™j°(t) contains the entire vector x and (1) = z^j. 
Therefore, part (a) of Theorem [3] holds for t = 1. In part A 
below, we will show that if (a) holds for some t, (b) holds for 
t + 1. In part B, we will show the reverse implication that if 
(b) holds for some t so does (a). In part C, we apply Theorem 
[3] to prove Theorem [TJb) and (c). 

A. Analysis of the Input Node Update 

Suppose that part (a) of Theorem [3] holds for some t > 
1. We will prove part (b) holds for t+1. The asymptotic 
limit f78} has already been proven. We only need to show 
that (<7j<_j(t), Sj) is asymptotically sufficient to describe the 
conditional distribution of Xj given H^jit + 1). 

To this end, suppose that Gi^j(t + 1) is a tree. By the 
construction of the computation subgraphs, Gi-yj (i) must be 
a tree for every I £ N in (j), £ ^ j. Now define, for any r > 1, 
the "actual" derivatives of the likelihood 



D 



QV 



logp^.,(t,w) 



(79) 



where pf.j{t,v) is defined in (111. Since Ge-+j(t) is a tree, 
Lemma |5] shows that (i, zt 
distribution 
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is the conditional 



given H£-yj(t). Bringing the derivatives in 



( 79 1 inside the expectation in ( flT) we can rewrite ( 79 1 as 



r.act 



(t,vi) 



-E 



Qv 



^ogp Y \z(yi\u + z e ^j)\ u=Q | He^j(t) 



(80) 



where the expectation is over the conditional distribution of 
Zi->.j given H^j(t). Also, using ( 15 1 and ( 16 1, we can write 



D r (y,z,p) 



E 



— logp Y \z(yi\u + z)\ u=0 I Z,{1 



(81) 



where the expectation is over z ~ Af(z, p). 

Now, the induction hypothesis, Theorem [3|a), states that 
zi^,j(t) is asymptotically sufficient for zi-yj given Hi^j(t) 
with the asymptotic distribution 

lim lim {zt-+j,z t -+j{t)) =7V(0,P z (M z (i)), 

d— too n->oo 

where /i z (t) = for the regular algorithm and fJb z {t) = 

/if Q (t) for the "genie algorithm". Applying this property to 
dHOl to (TSTb, we obtain that 



asymptotic distribution ( 78 i 



lim lim Di%{t,y t ) - D r ( yi , z t ^{t), p*(t)) = 0, (82) 
almost surely. 
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We can now rewrite pf^j(t + 1, Xj) in ( 13 1 as 

log Pi<- j(t + 1, xj) + log px(xj) + const 



lim lim - 

d— >oo n— too 



(a) 



(&) 



(c) 



lim lim V logp^. (t, $^£Bj) 

d— !-oo n— s-oo * — ' J 

lim lim V D e r ^ t (t,y e )$ ej Xj 

d— too n— too 



lim 

d,n—>oo 



i^t&ytmejXjf + 0(\$ ej xj\ 3 ) 

D T (y e , z t -> j(t),/j, z (t))$ J (> j Xj 



E 



eeN in (j)^i 
^D r (y e ,z e ^(t), f i z (t))\^ ej x^ 



= lim 



(£) 



d,n-Kx> 2pfl(t) ° 



Qii-j 



Wl 2 



where the constant is independent of Xj\ (a) follows from ( [13) ; 
(b) is the Taylor's series expansion of logp"^ ■(£, QijXj); (c) 
follows from ((82) and pOd) ; (d) follows from ((T9) and @ 
and (e) follows from ((72). Here, with some abuse of notation, 
we have written lim A = lim£> in place of lim(A — B) = 0. 
Using this same convention, the above equations show that 



lim lim pf, jft + l,Xj) 

d— too n-+oo J 

= lim lim const 



xp x (xj)eiq> 



2^j(t) 



\xj -qi^j{t)f 



(83) 



From Lemma [5] the left hand side of ( (83) precisely the 
conditional distribution of Xj given Hi^j(t+1) (or Hf™j e (t)). 
Therefore, ( (83) shows that this conditional distribution is 
asymptotically only a function of %^j(t) and Sj, and there- 
fore Sj) is asymptotically sufficient for given 
ffi^(t+l). 



B. Analysis of the Output Update 

Continuing the induction argument, we next show that if 
the part (b) of Theorem [3] holds for some t, then so does part 
(a). We have already proven the asymptotic distribution (42c i. 



So, we just need to show that the conditional distribution of 
Zi-yj given Hi->j(t) asymptotically depends only on %^j{t). 

Now, from Lemma [5] the conditional distribution of z^j 
given Hi_yj(t) is given by p*_^(i, Zi-yj) from the BP algo- 
rithm. But this distribution is described by the summation |9). 
The analysis in Appendix E-B shows that this summation has 



an asymptotic Gaussian distribution Af(zi^j(t), n z (t)). So, 
%^,j{t) is asymptotically sufficient to describe the distribution. 

This completes the induction argument and proves Theo- 
rem [3] 



C. MSE Relationships 

Using Theorem [3] we can now prove parts (b) and (c) 
of Theorem [T] First observe that Theorem [3] along with the 



definition of an asymptotically sufficient statistic shows that 



lim lim F,(xj\Hi^j(t)) 



d—too n— too 
(a) 



(t + 1) 



lim lim E(x\q = qi^j(t),s = Sj) 



d— too n— too 



(t+l) 



(b) 



(c) 



lim lim F- ln (qi^j(t), /U 9 (t)/ Sj) - Xi^j{t + 1) 

d— too n— too 



(84) 
where in (a) the expectation is with respect to (x,q,s) dis- 



tributed as (78 i; (b) follows from the definition of F m (-) in 
Section |IV-A[ and (c) follows from ((21) and ((78). The limit 
( 84 1 shows that the conditional variance is given by 



lim lim E (var fx, • | £?»<_,• (i))|s,- = s) 

d— too n— too 



lim lim E (\xj - E(a; ? -|ff^ 7 -(t))| 2 |s, = s) 



d— s-oo n— voo 



(a) 
(*>) 



= lim lim E (\xj — x^j 

d— too n-^too 



it)? 



Edx-Fin^/x^W/s)! 5 
E(£ ia ( 9 , M ^)/ S )| S ) 



( = } £ in ( M 9 (i),s) ( =V(* + M), 



(85) 



where (a) is due to the limit d84); (b) is due to the limit d78); (c) 



is the definition of £- m (q,n); (d) follows from (32 1; and (e) is 
from ( |37b) . The limit ( |85) holds for the regular algorithm with 
/i x (t+l,s) = s) and for genie algorithm with [i x (t+ 

1, s) = Mk>(^ + 1; s )- With the regular (non-genie) algorithm, 
the limit ( |85) shows ((43a). Also, for the genie algorithm, the 
sigma algebra Hf™- C (t) is contains the sigma generated by 
just y. Therefore, 

var(^|y)>var(^| J ffft7(*)). 



Combining this inequality with ( |85) shows ( |44a) . 

A similar argument can be used to show ( 43b) and (44b i. 
We have thus shown part (b) and (c) of Theorem [T] 

Appendix G 
Proof of Theorem|2] 

The proof is based on a degradation argument, which is 
used commonly for convergence proofs of BP algorithms 
pO) . Suppose that X — > Y -» Z is a Markov chain. 
Then, we say that Z is degraded with respect to Y, since 
estimates of X from Z are strictly worse than those from Y. 
The following lemma states a standard property of degraded 
random variables. 

Lemma 8: Suppose that X — > Y — » Z is a Markov chain. 
Then, 

(a) The conditional variance of X satisfies 

\ar(X\Y) < \ar(X\Z). 

(b) Suppose the likelihood function of X given Y and the 
likelihood of X given Z both have continuous third 
derivatives. Then, for any x, 

F{Z\X =x)< F(Y\X = x) 
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where, for any random variables X and W, F(W\X = x) 
is the Fisher information 

d 2 

E Q^\ gPw\x{w\x) X 



F(W\X 



(86) 

Proof: See, for example, [42]. ■ 
We will combine this lemma with the following simple 

iteration result to prove the theorem. 

Lemma 9: Suppose G(/i) is a monotonically increasing, 

continuous function with < G(/i) < ii max for all /i and 

SOme Umax- 

(a) Consider the sequence fi(t + 1) = G(/i(t)) initialized 
with /i(l) = fi max . Then 

lim /i(£) = fi, 

for some /i with /i = G(/i). Moreover, the limiting 
value [i is the largest value satisfying /i = G(/i) and 
e [0,/it mox ]. 

(b) Similarly, if the above sequence is initialized with fi(l) = 
0, then fi(t) — > fj, where /i is the smallest value satisfying 
[i £ [0,/i maK ] and the fixed point equation /i = G(/i). 

Proof: We will just prove part (a) as part (b) is similar. 
We first prove by, induction, that fi(t + l) < /i(i) for all t. For 
t = 1, since is initialized to fi max and G(/i) < /imax for 
all \i, we have that 

Mi). 

Now suppose that + 1) < /i(t) for some i. Then, using 
the monotonicity of G(/x), 

H(t + 2) = G(p(t + 1)) < G(n(t)) = fi(t + 1). 

So, by induction, /i(t) is a monotonically decreasing sequence. 
Since it is bounded below by zero, it must converge to some 
/i. By the continuity of G(/i), the limit point must satisfy the 
fixed point equation /i = G(fi). 

It remains to show that the limiting value // is the largest 
fixed point of G in the interval [0, /i mQX ]. To this end, let /zi 
be any fixed point [i\ = G(/ii) with < [i\ < Umax- Then, 
/i(l) = Umax > Mi. Also, if > fix, by the monotonicity 
ofG, 

fi(t + i) = G{n(t)) >m. 

So, by the induction the entire sequence /i(i) > [i\. Taking 
the limits as t — > oo, we have that fi> fix. Hence fi> /ii for 
any other fixed point of G. ■ 
We can now prove Theorem [2] Define the function 

G(n) = <3£ in \£ ont ( f i)] , (87) 



so that we can rewrite the density evolution equation (38 i as 



//(i+l) = G( M *(t)). 

We will now apply Lemma [9] to show that fi z (t) — > fi to a 
fixed point /i = G(/i). We first upper bound G(/i). For all 
fi > 0, 

E [& m {q, fj,)} ( => var(X|Q) < var(X) ^ /if nit . 



where the expectation is over the random variable q = x + v, 
v ~ A/"(0, /i); (a) follows from the definition of £i n (q, m); (b) is 
the fact that conditioning cannot increase the variance; and (c) 



is from the definition of /if nit in ( 17 1. Therefore, the definition 
of £i n (n) in (32 1 implies that 



£ in (n) = E [s£ in (q, fi/s)] < E(s)m 



init • 



As a result, G(fi) defined in (87i satisfies 

G( M ) < /3E( S )/4 it = M f nit) 



where /if nit is defined in (33 i. So, we have that G(/i) < /if nit 



for all /i. Also, the "high" sequence is initialized with 

Athi(l) = Minit anc * tne "l° w " sequence with ^f (l) = So, 
we will apply Lemma [9] with [i max = jjf nit . 

By the assumption of the theorem, £- m (ii) and £ ut(^) are 
continuous. Therefore, so is G(n). 

Hence, to apply Lemma [9] it remains to show that G(/i) 
is monotonically increasing. From (87i, we need to show that 



£i n (li) and £ ut(M) are monotonically increasing. 

We first consider £ ln (ii). Let /i 2 > Mi and define the 
random variables 

qi = x + vi, vx ~ Af(0,iii/s), 

q 2 = qi+w, w~ jV(0, (/i 2 - fi x )/s), 

where x ~ px(x), s ~ ps(s)> an d «i and «2 are independent. 
We have that x — > gi — >• (/2 is a Markov chain, so Lemma 
|HJa) shows that, for all s, 



var(X\Q 1 ,S = s) <\ar{X\Q 2 ,S = s). 



(88) 



Also, <72 is identically distributed to q 2 = x + v 2 , v 2 ~ 
Af(0,fi 2 /s) for some U2 independent of x. The definition of 
£ in(M) shows that for i = 1, 2, 



S in (/i i )=E[svar(X|Qi,5 = s)] 



(89) 



Combining (881 and (89 1 shows that £i a (fi) is monotonically 



increasing in /1. 

The proof that £ ut(n) is monotonically increasing is sim- 
ilar. Let iii and [i 2 be variances such that 

< Mi < < Mftut- 
For «eK, define the random variables 

z 2 - Af(0,/4a t -/x 2 ) 
zi - z 2 + A/"(0,^ 2 - Mi) 

z - U + Zl + A/"(0,/!l) 

where all the Gaussian random variables are independent. 
Also, conditional on z, let y have the distribution y ~ 
PY|z(y| z )- It can t> e verified that 

w -> (Si, y) -> (^2,2/) 

is a Markov chain. It follows from Lemma [HJb) that 

F(^i,r|^ = 0)>F(Z 2 ,F|^ = 0). (90) 

Also, the definitions of z, % and z 2 above show that, for 
i = 1,2, when u = 0, 

(«,5i) ~-A/'(0 > P,( w )), 
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where P z {n) is defined in (36i. 

Now, the Fisher information satisfies 



F(Z h Y\U 
S -E 



0) 

d 2 



(<*) 



E 



1 



(91) 



where (a) follows from the definition of the Fisher information 
in ((86]); (b) follows from the fact that 

l °gPz z ,Y\u&>y\ u ) 

= logp Yl z uU (y\u,Zi) +logp2 ilu (%\u) 

and Zi is independent of u; (c) is the definition of D r (y, z. u) 
in ( 16 1; and (d) follows from the definition of £ out (/i) in (34i. 



Equations ( 90 » and (34i together now show that £ O ut(/-0 is 



monotonically increasing in fx. Since £i n (/i) is also monoton- 
ically increasing in /i, so is G(/i). 

Lemma [9] thus shows that converges to the largest 

fixed point solution of the equation fi = G(/i) and /xf (t) 
converges to the smallest fixed point. 
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